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The sequence of the spike (also called peplomer or E2) protein gene of the Mebus strain of bovine coronavirus (BCV) 
was obtained from cDNA clones of genomic RNA. The gene sequence predicts a 150,825 mol wt apoprotein of 1363 
amino acids having an N-terminal hydrophobic signal sequence of 17 amino acids, 19 potential N-linked glycosylation 
sites, a hydrophobic anchor sequence of approximately 17 amino acids near the C terminus, and a hydrophilic cysteine- 
rich C terminus of 35 amino acids. An internal Lys—-Arg—Arg—Ser-Arg—Arg sequence predicts a protease cleavage site 
between amino acids 768 and 769 that would separate the S apoprotein into S1 and S2 segments of 85690 and 65153 
mol wt, respectively. Amino terminal amino acid sequencing of the virion-derived gp100 spike subunit confirmed the 
location of the predicted cleavage site, and established that gp120 and gp100 are the glycosylated virion forms of the 
S1 and S2 subunits, respectively. Sequence comparisons between BCV and the antigenically related mouse hepatitis 
coronavirus revealed more sequence divergence in the putative knob region of the spike protein (S1) than in the stem 


region (S2). © 1990 Academic Press, Inc. 


The bovine coronavirus (BCV) is an important cause 
of neonatal calf diarrhea (74, 26) and may also be the 
cause of winter dysentery in adult cattle (30). The 
mechanisms by which BCV causes disease and persis- 
tent infection are not understood, nor are current vac- 
cines universally regarded as effective. Toward these 
ends, we have begun a detailed study of the BCV pro- 
tein and genome structure. 

BCV is comprised of four major structural proteins 
(17). These are (i) a 200-kDa spike (peplomer) glycopro- 
tein (S), that exists on the virion as cleaved subunits of 
approximately 120 and 100 kDa, {ii) a 140-kDa glyco- 
protein (HE) that has both hemagglutinating (78) and 
esterase (37) activities, and which is comprised of two 
identical, disulfide-linked 65-kDa subunits (70, 12, 76, 
28), (ili) a 26-kDa integral membrane glycoprotein (M) 
(27), and , (iv) an internal phosphorylated nucleocapsid 
protein (N) (27). Of these, the S protein is presumed to 
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be the major structure by which coronaviruses attach 
to cells and initiate infection (reviewed by Spaan et al. 
(34)). The HE protein, however, may also bind to cells 
to initiate infection, and for BCV, the relative impor- 
tance of these two proteins in initiating infection is not 
known. Both S and HE are probably important in induc- 
ing immunity since antibodies to each are known to 
neutralize virus infectivity in cell culture and in calves 
(8, 9). S and HE, therefore, may both be useful in devel- 
oping effective engineered vaccines against BCV. 
cDNA cloning of BCV genomic RNA was accom- 
plished essentially as previously described (77, 27) ex- 
cept that random 5-mer oligodeoxynucleotides (Phar- 
macia) and 17-mer oligodeoxynucleotides of specific 
sequences were used as primers for first-strand syn- 
thesis. Clones were mapped relative to one another 
and to the 3’ end of the genome using a matrix spot 
hybridization technique. Some clones were sequenced 
by the chemical method of Maxam and Gilbert (25) and 
some by the dideoxynucleotide-induced chain termina- 
tion method of Sanger (37) as described by Kraft et a/. 
(79) using Sequenase enzyme (United States Biochem- 
icals). For much of the sequencing, restriction endonu- 
clease fragments were subcloned into the pGEM4Z 
vector (Promega) and forward and reverse sequencing 
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Fic. 1. Gene map of the BCV genome, cDNA clone positions, and strategy for sequencing the S gene. 


primers for the pGEM vectors were used. Sequence- 
specific oligodeoxynucleotides were also synthesized 
and used for sequencing within certain regions of the 
large clones. 

The amino-terminal ends of purified gp120 and 
gp100 subunits were subjected to sequencing by the 
method of Matsudaira (24). Unlabeled BCV was puri- 
fied by isopycnic sedimentation in Sucrose gradients 
and the proteins were electrophoretically separated af- 
ter reduction in 2-mercaptoethanol (77) and elec- 
troblotted (73) onto polyvinylidene difluoride mem- 
brane (24). Proteins were visualized by staining with 
Coomassie brilliant blue and the go120 and gp100 
bands were excised and shipped to Dr. Matsudaira for 
analysis. 

Complete sequencing of clone MA7 which extends 
4.2 kilobases from the 3’ end of the genome (Fig. 1) 
revealed a continuous open reading frame located on 
the 5’ side of the ORF for a potential 4.9-kDa protein 
(Abraham et a/., to be published elsewhere). The de- 
duced amino acid sequence of the extended ORF dem- 
onstrated high sequence similarity to the C-terminal 
end of the antigenically related MHV-A59 (22) and 
MHV-JHM (32) S proteins, both antigenic homologs of 
the BCV S protein (73). These data suggested that the 
8 protein gene of BCV lies in the same relative position 
on the genome as does the spike protein gene of MHV. 
To complete the sequencing of the S gene, both 
strands of three clones, 11, HPA2, and G6, generated 
by random priming, and three clones, LK5, LP6 and 
29, generated by specific priming, were sequenced 
(Fig. 1). 

The total sequence for the putative S ORF extended 
to a position 7.4 kb from the 3’ end of the genome and 
contained 4089 bases (Fig. 2). We conclude this ORF 
to be the S gene since it potentially encodes a 1363 
amino acid protein of 150,825 Da, the approximate 
size of the unglycosylated spike precursor (70), and be- 


cause its deduced amino acid sequence shows exten- 
sive sequence similarity throughout with the S proteins 
of both strains of MHV. Five other open reading frames 
ranging in size from 34 to 66 amino acids were also 
found within the S gene sequence in the plus one read- 
ing frame, but their significance is not known at this 
time. The putative S ORF is preceded immediately up- 
stream (beginning at base 12 in Fig. 2) by the consen- 
sus CYAAAC sequence thought to play a role in leader 
priming of coronavirus transcription. The sequence is 
also found three times within the S ORF, beginning at 
positions 817, 1667, and 3776, but itis not established 
that transcripts initiate at any of these sites. 

Five features of the deduced BCV S protein reflect 
the properties of four other coronavirus spike proteins 
that have been characterized to date from nucleotide 
sequence data (7, 2, 75, 20, 22, 27, 29, 32). (i) There is 
an N-terminal hydrophobic stretch of amino acids 
which predicts a signal peptide with a cleavage site be- 
tween amino acids 17 and 18 (38). (ii) There are 19 po- 
tential asparagine-linked glycosylation sites that could 
give rise to the only kind of glycosylation demonstrated 
for this protein (Hogue and Brian, unpublished data; 
10). (iii) There is a hydrophobic stretch of 17 amino 
acids near the C terminus that could serve as a stop- 
transfer and anchor sequence. (iv) There is a stretch of 
8 amino acids on the immediate N-terminal side of the 
predicted anchor sequence (-K-W-P-W-Y-V-W-L-, be- 
ginning with amino acid 1305) that is identical in all co- 
ronavirus S proteins sequenced to date. (v) There is a 
cysteine-rich hydrophilic C-terminus of 35 amino acids 
that is probably the intravirion domain. In common with 
MHV- (22, 32) and IBV (7, 2, 20, 27), but not in common 
with TGEV (75, 29; Tung and Brian, unpublished) and 
FIPV (75), is also an internal sequence of basic amino 
acids that, in the case of MHV and IBV, lies on the im- 
mediate N-terminal side of the protease cleavage site 
(6, 22). In BCV the sequence is K-R-R-S-R-R beginning 


298 


SHORT COMMUNCATIONS 


10 20 30 40 50 60 70 80 90 100 110 120 
TAGACCATAATETAAACATGTTTTTGATACTTTTAATTTCCTTACCAATGGCTTTTGCTGTTATAGGAGATTTAAAGTGTACTACGGTTTCCATTAATGATGTTGACACCGGTGCTCCTT 
M F LIoLtL oes t. PMaAFAViICGDLEKCTTVS INDVODT GA ?P 


130 140 150 160 170 180 190 200 210 220 230 240 
CTATTAGCACTGATATTGTCGATGTTACTAATGGTTTAGGTACTTATTATGTTTTAGATCGTGTGTATTTAAATACTACGTTGTTGCTTAATGGTTACTACCCTACTTCAGGTTCTACAT 
sSISTDIVDVTNGLGTYYVLDRVY¥Y LN TTLLELNGYY PTSGST 

40 

250 260 270 280 290 300 310 320 330 340 350 360 
ATCGTAATATGGCACTGAAGGGAACTTTACTATTGAGCAGACTATGGTTTAAACCACCTTTTCTTTCTGATTTTATTAATGGTATTTTTGCTAAGGTCAAAAATACCAAGGTTATTAAAA 
Y RNM ALK GMTLLLS Rb WF K PP FOS DF INGIFAKVKNTKVI«K 

80 

370 380 390 400 410 420 430 440 450 460 470 480 
AGGGTGTAATGTATAGTGAGTTTCCTGCTATAACTATAGGTAGTACTTTTGTAAATACATCCTATAGTGTGGTAGTACAACCACATACTACCAATTTGGATAATAAATTACAAGGTCTCT 
K GVM YS EF PATTdtIGSsS TFVNT SY SVVVQPHTTNLDNKLQGL 

120 


490 500 510 520 530 540 550 560 570 580 590 600 
TAGAGATCTCTGTTTGCCAGTATACTATGTGCGAGTACCCACATACGATTTGTCATCCTAATCTGGGTAATAAACGCGTAGAACTATGGCATTGGGATACAGGTGTTGTTTCCTGTTTAT 
L EBEBIs v ¢c @ ¥ T M C E ¥Y P H T IC H PKR L GN K R VE LW Hw Df G ¥Y Vv S§ CL 


160 


610 620 630 640 650 660 670 680 690 700 710 720 

ATAAGCGTAATTTCACATATGATGTGAATGCTGATTACTTGTATTTCCATTTTTATCAAGAAGGTGGTACTITTTATGCATATTTTACAGACACTGGTGTTGTTACTAAGTTTCIGITTA 

YK RIN) F TY DVNA DY LY FHF YQEGGTPFYAYFTDTGVVTK ELF 
200 


730 740 750 760 770 780 790 800 810 820 830 840 

ATGTTTATTTAGGCACGGTGCTTTCACATTATTATGTCCTGCCTTTGACTTGTTCTAGTGCTATGACTTTAGAATATTGGGTTACACCTCTCACTIP-TAAAGAATATTTACTAGCTITCA 

NvyYLGMTfeVLSHY YVLPLTCS SAMTLEYWVTPLTS KQYLLA FP 
240 


850 860 870 880 890 900 910 920 930 940 950 960 
ATCAAGATGGTGTTATTTTTAATGCTGTTGATTGTAAGAGTGATTTTATGAGTGAGATTAAGTGTAAAACACTATCTATAGCACCATCTACTGGTGTTTATGAATTAAACGGTTACACTG 
N QD Gv%tiI F N AvVoDc¢cK S DF MS EIT K CC K TLS TAPS TG VY Y ELN GY T 


280 


970 980 990 1000 1010 1020 1030 1040 1050 1060 1070 1080 
TTCAGCCAATTGCAGATGTTTACCGACGTATACCTAATCTTCCCGATTGTAATATAGAGGCTTGGCTTAATGATAAGTCGGTGCCCTCTCCATTAAATTGGGAACGTAAGACCTITTCAA 
voP IAD VV ¥ R R I PN LP DCN ITE AW LN DK S VPS PLN WE R K T F S 


320 


1090 1100 41110 1120 1130 1140 1150 1160 1170 1180 1130 1200 
ATTGTAATTTTAATATGAGCAGCCTGATGTCTTTTATTCAGGCAGACTCATTTACTTGTAATAATATTGATGCTGCTAAGATATATGGTATGTGTTITTCCAGCATAACTATAGATAAGT 
N Cc N F [§] MS S LM S FI QAOD SS F TC N NIDA AK I ¥ GM CF S S IT ITD K 


360 


1210 1220 1230 1240 1250 1260 1270 1280 1290 1300 1310 1320 
TTGCTATACCCAATGGTAGGAAGGTTGACCTACAATTGGGCAATTTGGGCTATTTGCAGTCTTTTAACTATAGAATTGATACTACTGCTACAAGTTGTCAGTTGTATTATAATTTACCTG 
FAtIPNGRkKVODLgQtuLGNLG ¥Y LQ s F N ¥ RIODT TAT S €C OL ¥ ¥ NL P 


400 


1330 1340 1350 1360 1370 1380 1390 1400 1410 1420 1430 1440 
CTGCTAATGTTTCTGTTAGCAGGTTTAATCCTTCTACTTGGAATAGGAGATTTGGTTTTACAGAACAATTTGTTTTTAAGCCTCAACCTGTAGGTGITTTTACTCATCATGATGTTGTTT 
AAW VS VS RFNPSTWNRREGFTEQFV FREQ PVG VY FT HH DV Vv 

440 


1450 1460 1470 1480 1490 1500 1510 1520 1530 1540 1550 1560 
ATGCACAACATTGTTTTAAAGCTCCCTCAAATTTICTGTCCGTGTAAATTGGATGGGTCTTTGTGTGTAGGTAATGGTCCTGGTATAGATGCTGGTTATAAAAATAGTGGTATAGGCACTT 
Y AQH CF K APS N FCPCKLODGS LEVEN GPGIdDAGY KN S GIG T 


+ + F * © FY F FY F © BF HY FE FY FT BH FF F BH Y HB W FY F 


480 
1570 1580 1590 1600 1610 1620 1630 1640 1650 1660 1670 1680 
GTCCTGCAGGTACTAATTATTTAACTTGCCATAATGCTGCCCAATGTAATTGTTTGTGCACTCCCGACCCCATTACATCTAAATCTACAGGGCCTTACAAGTGCCCCCAAACTRAATACT 
c PAGTNYOLTCHNAAQECNCL ETP DPI TS KS TGP Y KC PQ K Y 
* ee FY We FF FF eS Ow, Pee ee Cas pe ge a te tn Ye ee a 
520 
1690 1700 1710 1720 1730 1740 1750 1760 1770 1780 1790 1800 


TAGTTGGCATAGGTGAGCACTGTTCGGGTCTTGCTATTAAAAGTGATTATTGTGGAGGTAATCCTTGTACTTGCCAACCACAAGCATTTTTGGGCTGGTCTGTTGACTCTTGTTTACAAG 
LvGIGEHCSGLAITKSDYCGGNPCTCQPQOAFLGEWSVDSCLQ 


560 


1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920 
GGGATAGGTGTAATATTTTTGCTAATTTTATTTTGCATGATGTTAATAGTGGTACTACTTGTTCTACTGATTTACAAAAATCAAACACAGACATAATTCTTIGGTGTTTGTGTTAATTATG 
G DR ¢C N IF AN F IL HODVWNS GTTCS TDL QK S NT DI ITLGyvYeVv N Y 


600 


1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 
ATCTTTATGGTATTACAGGCCAAGGTATTTTTGTTGAGGTTAATGCGACTTATTATAATAGTTGGCAGAACCTTTTATATGATTCTAATGGTAATCTCTATGGTTTTAGAGACTACTTAA 
DLYGtHITGQGIF veEV{[RIJATY YNSWQNLLYDSNGNLYGFRODYOL 

640 


2050 2060 2070 2080 2090 2100 2110 2120 2130 2140 2150 2160 
CAAACAGAACTTTTATGATTCGTAGTTGCTATAGCGGTCGTGTTTCAGCGGCCTTTCATGCTAACTCTTCCGAACCAGCATTGCTATTTCGGAATATTAAATGCAATTACGTTTTTAATA 
T } R T F M IRs Cc ¥ S$ GR VS AA FHA —§} S S E P ALO F R WN TK CN Y¥ V F 

680 

2170 2180 2190 2200 2210 2220 2230 2240 2250 2260 2270 2280 
ATACTCTTTCACGACAGCTGCAACCTATTAACTATTTTGATAGTTATCTIGGTTGTGTTGTCAATGCTGATAATAGTACTTCTAGTGTTGTTCAAACATGTGATCTCACAGTAGGTAGTG 
NT LS RQULOQ PIN Y FDS Y LGEV VN A D Ga] s TS S VV QT C DL TV EG S 


720 


2290 2300 2310 2320 2330 2340 2350 2360 2370 2380 2390 2400 
GTTACTGTGTGGATTACTCTACAAAAAGACGAAGTCGTAGAGCGATTACCACTGGTTATCGGTITACTACTTTTGAGCCATTTACTGTTAATTCAGTAAATGATAGTTTAGAACCTGTAG 
G¥cvbDY¥ s TK RRS RRA I TT GY RF TT F EP FT VN S VP} DS LEP V 


760 


2410 2420 2430 2440 2450 2460 2470 2480 2490 2500 2510 2520 
GTGGTTTGTATGAAATTCAAATACCTTCAGAGTTTACTATAGGTAATATGGAGGAGTTTATTCAAACAAGCTCTCCTAAAGTTACTATIGATIGTICIGCTITTGICTGTGGTGATTATG 
GGULyYEtIQti1I PS EF TIGNM EEF I Q@TS SPKVTIODCS AFVEC GBD Y 


800 
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2580 


2590 2600 2610 2620 2630 2640 


Beee ee eee eee a eee ee ee ee ok Ee ee ee 
INA 


2530 2540 2550 2560 2570 
AACKS QLVEYGSs FCODN 
840 
2650 2660 2670 2680 2690 


2710 2720 2730 2740 2750 2760 


ATGGTGTCACTCTTAGCACTAAGCTTAAAGATGGCGTTAATTTCAATGTAGACGACATCAATTTTTCCCCTGTATTAGGTTGTTTAGGAAGCGATTGTAATAAAGTTTCCAGCAGATCTG 


NGvVTLSTKLKODGVNFNVODOD I fy 


880 
2770 2780 2790 2800 2810 


2820 


F S PV LGc¢uLGeS DCN KV S S R S 


2830 2840 2850 2860 2870 2880 


CTATAGAGGATTTACTTTTTTCTAAAGTAAAGTTATCTGATGTCGGTTTCGTTGAGGCTTATAATAATTGTACTGGAGGTGCCGAAATTAGGGACCTCATTTGTGTGCAAAGTTATAATG 
A TITEDLUOFS kK VKLSDVGFEVEAY NCcTG GAETITRDLICV QS ¥ N 


2940 


2950 2960 2970 2980 2990 3000 


eta atc Sead acd SAC cei cee  etiee ee thed eee eae aera aaa ae a ake ta 
L P Vv FY 


3070 3080 3090 3100 3110 3120 


CCE ae eet pele he ccat a Zi oleae och aepcade oaks a ehh ta cea ak te a eta ae ed A a 
L N N A A Q 


920 
2890 2900 2910 2920 2930 
GIKVBLPPLLESVNQISGYTLAATSAS 
960 
3010 3020 3030 3040 3050 
NVQYRINGIGVTMDVLSQNQ 
1000 
3130 3140 3150 3160 3170 


3190 3200 3210 3220 3230 3240 


CCAATTCTGCTTTAGTTAAAATTCAAGCTGTTGTTAATGCAAATGCTGAAGCTCTTAATAACTTATTGCAACAACTCTCTAATAGATTTGGTGCTATAAGTTCTTCTTIACAAGAAATIC 
T NS ALVK IQAVV NAN A EAL NNLLQQuLtSsS NRF GATISs S&S SL QE TI 


1040 
3250 3260 3270 3280 3290 


3310 3320 3330 3340 3350 3360 


TATCTAGACTGGATGCTCTTGAAGCGCAAGCTCAGATAGACAGACTTATTAATGGGCGTCTTACCGCTCTTAATGTTTATGTTTCTCAACAGCTTAGTGATTCTACACTAGTAAAATITA 
LS RLDAULEAQA QIDRLINGRLETaALN VY VS QQgLS DS TLV K F 


3430 3440 3450 3460 3470 3480 


Se Ne Mee COAT ee Oe ee Ere Seago BORAT OCT CeATATeGT IT Gr 
v Q 


N F c GN G N H I Ss 


3550 3560 3570 3580 3590 3600 


a PN ee OAH alc cet Ac arr BD ae a crt Rat act ae ten aa Sacer tere oa Sa ATA 
VT A K s P Leora GDR GIA DB K S GY FOV NV ih] 


1080 
3370 3380 3390 3400 3410 
S$ AAOQAM©M EK VNECVKSOS SRI 
1120 
3490 3500 3510 3520 3530 
¥Y F IHF S ¥Y V PT KY 
1160 
3610 3620 3630 3640 3650 


3670 3680 3690 3700 3710 3720 


ATACTTGGATGTTCACTGGTAGTGGTTATTACTACCCTGAACCCATAACTGGAAATAATGTTGTTGTTATGAGTACCTGTGCTGTTAACTATACTAAAGCGCCGGATGTAATGCTGAACA 
NT WM FTGSGYYY¥YPEPITGNNVVYvVM S TC A VN} Y TK AP DV M iL {[R} 


1200 
3730 3740 3750 3760 3770 3790 3800 3810 3820 3830 3840 
TTTCAACACCCAACCTCCATGATTTTAAGGAAGAGTTGGATCAATGGTTTAAAAACCAAAGA TCAGTGSCACCAGATTTGTCACTIGATTATATAAATGTIACATICTTGGACCTACAAG 
IS T PN LHODFK EEL DOW FEF KIN Q TS VAP OD L L DY IfW}v TF LDL 
1240 
3850 3860 3870 3880 3890 3910 3920 3930 3940 3950 3960 
ATGAAATGAATAGGTTACAGGAGGCAATAAAAGTTTTAAATCAGAGCTACATCAATCTCAAGGACATTGGTACATATGAGTATTATGTAAAATGGCCTTGGTATGTATSGCTTTTAATTG 
DEMNRLQEAT KVL OS YIN LKDIGT YE 
1286 
3970 3980 3990 4000 4010 4030 4040 4050 4060 4070 4080 


GCTTTGCTGGTGTAGCTATGCTTGTTTTACTATTCTTCATATGCTGTTGTACAGGATGTGGGACTAGTTGTTTTAAGATATGTGGTGGTIGTTGTGATGATTATACTGGACACCAGGAGT 
G F AGVAM LVLOL*F FIc¢ccet Geers c F KiI¢c¢G&G& GCC D DY TGHQE 


1320 
4090 4100 
TAGTAATTAAAACATCACATGACGACTAA 
LV IK TS HDD 
MT T 
1360 


Fic. 2. Nucleotide sequence of the S gene and its deduced amino acid sequence. The nucleotide sequence shown begins with the TAG 
termination codon of the HE gene (underlined) 17 bases upstream of the presumed S start site (7407 bases from the poly(A) tail), and ends with 
the TAA termination codon of the S protein. The first three amino acids of the putative 4.9-kDa protein are shown beginning at base position 
4099. Consensus CYAAAC sequences are boxed. The presumed amino-terminal signal peptide and carboxy-terminal anchor sequences are 
underlined. Potential N-linked glycosylation sites (NXS or NXT, where X # P) are boxed. The proteolytic cleavage site separating S1 and S2 is 
identified with an arrow. The extended sequence of amino acids missing in MHV JHM is identified by individually underlined amino acids, and 


that missing in MHV A859, by asterisks. 


with amino acid 763, and, on the basis of the pattern 
in MHV and IBV, predicts a cleavage between amino 
acids 768 and 769 (note arrow in Fig. 2). Cleavage at 
this point would divide the unglycosylated S protein 
into an N-terminal segment of 85,690 Da (S1) and a C- 
terminal segment of 65,153 Da (82). 

From amino acid sequencing studies, no N-terminal 
sequence could be obtained from the virion-derived 
120-kDa subunit, possibly because of N-terminal 


blockage. The N-terminal sequence of the 100-kDa 
subunit could be obtained, however, and was deter- 
mined to be X-I-T-T-G-Y-X-F-, identifying the first amino 
acids downstream from the predicted internal cleavage 
site. These results confirmed the predicted internal 
cleavage site and established that the 120-kDa subunit 
is S1 and the 100-kDa subunit is S2. 

The BCV and MHV S proteins show remarkable se- 
quence homology suggesting that these viruses are re- 
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AMINO ACID POSITION 


Fic. 3. Structural comparison of the S proteins of MHV-JHM, MHV-A59, and BCV. Sequences are aligned for maximum homology. A sequence 
found in BCV but not found in MHV-JHM or MHV-A59 is expressed as a gap (broken line) in the MHV sequences. Putative N-terminal signal 
peptides and C-terminal anchor sequences are boxed. Vertical lines above the sequence indicate potential asparagine-linked glycosylation 
positions, and below the sequence, cysteine positions. The identified (BCV, MHV-A59) and putative (MHV-JHM) proteolytic cleavage sites are 


identified by arrows. 


cently diverged. After aligning sequences for maximal 
homology, the following points emerge. (i) Relative to 
BCV, a large deletion appears in the MHV S1 subunits. 
For JHM it is a contiguous gap of 138 amino acids, and 
for ASQ it is a discontiguous gap of 50 amino acids 
(Figs. 2 and 3). The function of the additional sequence 
in the BCV $1 subunit is not known, but it is possibly a 
structure that interacts in some way with the HE glyco- 
protein, a structural protein not found on MHV (73, 34) 
except under certain rare conditions (33). No electron 
micrographic or chemical data exist, however, to sug- 
gest that S and HE do physically interact (3, 77, 78). It 
is interesting to note that the entire region in the BCV 
S protein corresponding to the gap region of the JHM 
S protein is especially rich in cysteine residues and 
contains 15 (26%) of the 56 total cysteines in the BCV 
S protein (Figs. 2 and 3). This suggests that this part 
of the molecule may be important for intramolecular or 
intermolecular disulfide linkages. (ii) Exclusive of the 
large gap in the MHV sequences, the $1 subunits of 
JHM and A59 show 62 and 60% identity, respectively, 
with BCV, and the S2 subunits show 75 and 74%, re- 
spectively. Throughout the S protein, 41 of 56 cysteine 
positions and 13 of 19 potential N-linked glycosylation 
sites are conserved. The internal proteolytic cleavage 
position (not yet confirmed for JHM) is also conserved. 
The pattern of greater amino acid sequence diver- 
gence in the S1 subunit is consistent with the model of 
Cavanagh (4) and De Groot et a/. (7) which proposes 
that the $1 subunit comprises the exposed bulbous 
structure of the spike and probably contains most (5), 
but not all (23, 36), of the neutralizable antigenic sites. 
It is the structure most likely to undergo changes as a 
result of immunologic selective pressures. 

Fusion of cells in culture is one biological activity as- 
sociated with cleavage of the MHV S protein (35). De- 
Spite its extensive sequence similarity with the MHV S 
protein, however, the BCV S protein shows little fusion 


activity. In fact, fusion is a behavior we have not ob- 
served with the Mebus strain of BCV even though the 
S protein is primarily in the cleaved form on the virion 
(73, 17). It is not clear why BCV and MHV behave so 
differently in their fusogenic properties, but functional 
evaluation of sequence differences near the cleavage 
sites of these two viruses may aid in clarifying the 
mechanisms of fusion by MHV. This is especially inter- 
esting since hydrophobic regions, common at the 
cleavage sites on fusion proteins of paramyxoviruses 
and myxoviruses, are absent in the MHV S protein (22) 
and different mechanisms of fusion may be employed. 
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